-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53573][SQL] IDENTIFIER everywhere #52765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e059de9 to
b1b4bba
Compare
b1b4bba to
4cbcebe
Compare
… maintainability - Extract IDENTIFIER_PREFIX constant for magic string - Improve getMultipartIdentifierText documentation with complete Scaladoc - Narrow exception handling from Exception to ParseException - Remove redundant ParserUtils prefix in class methods - Fix qualified label validation to check resolved identifiers - Ensure all comments are complete sentences ending with periods - Remove dead code and improve variable naming - Fix FOR loop variable resolution to use getMultipartIdentifierText All tests pass: - SqlScriptingParserSuite (qualified label validation) - SqlScriptingE2eSuite (identifier tests) - SQLQueryTestSuite (identifier-clause and identifier-clause-legacy)
move imports to head
dtenedor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial round of review. Thanks for working on this!
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/inputs/identifier-clause-legacy.sql
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala
Outdated
Show resolved
Hide resolved
… for loop label to strictIdentifier
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
|
thanks, merging to master/4.1! |
### What changes were proposed in this pull request? We propose expanding the IDENTIFIER() clause, which turns a string into a qualified identifier, to all places identifiers can appear. The current clause is severely limited in where it can go because it accepts constant expressions, including session variables. Due to the complexity of the argument the existing clause requires tricky code to incrementally analyze its arguments and then execute sections of parser code at a later point. By contrast the generalized IDENTIFIER clause only allows string literals which can be processed in the visitor methods. Due to the rework of parameter markers and string coalescing this allows for constructs such as: ``` SELECT * FROM IDENTIFIER(:cat '.' :schema '.' :table) ``` it even allows: ``` SELECT 'hello' AS IDENTIFIER(:alias); ``` This is really all identifier() needs. We may be able to deprecate and de-support the existing too complex identifier() implementation. ### Why are the changes needed? IDENTIFIER() is a popular feature, but it can only be used in very specific, hard to reason about places. The new implementation preserved 99% of teh fucntion while expanding its use to everywhere. ### Does this PR introduce _any_ user-facing change? Yes, it's a new feature ### How was this patch tested? expanded Parameters and inentifier-clause test suites. ### Was this patch authored or co-authored using generative AI tooling? Yes, Clause Sonnet 4.5 Closes #52765 from srielau/identifier-lite. Authored-by: Serge Rielau <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit daa29f6) Signed-off-by: Wenchen Fan <[email protected]>
|
Hi, @srielau and @cloud-fan . Is this a mistake? Or, did you want to make a huge followup? At the first glance, the JIRA Issue title looks weird to me. |
This was intentional, albeit perhaps improper procedure.
I will work with @cloud-fan to structure it into subtasks |
| * @see | ||
| * [[org.apache.spark.sql.catalyst.parser.AstBuilder]] for the full SQL statement parser | ||
| * | ||
| * ==CRITICAL: Extracting Identifier Names== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this CRITICAL: mean line 72 ~ 73?
- '''DO NOT use ctx.getText() or ctx.identifier.getText()''' directly! These methods do not
- handle the IDENTIFIER('literal') syntax and will cause incorrect behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. ideally I would have liked to block or override getText() but have not found a way to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Got it~
| * Example: | ||
| * {{{ | ||
| * // WRONG - does not handle IDENTIFIER('literal'): | ||
| * val name = ctx.identifier.getText |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, is this always wrong? What about the currently remaining (existing) code in AstBuilder.scala and SparkSqlParser.scala like the following?
$ git grep ctx.identifier.getText
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala: * '''DO NOT use ctx.getText() or ctx.identifier.getText()''' directly! These methods do not
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala: * val name = ctx.identifier.getText
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: val collationName = ctx.identifier.getText
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: lazy val name: String = ctx.identifier.getText
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: lazy val name: String = ctx.identifier.getText
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ctx.identifier.getText.toLowerCase(Locale.ROOT) != "noscan") {
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ctx.identifier.getText.toLowerCase(Locale.ROOT) != "noscan") {
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: val indexName = ctx.identifier.getText
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala: ctx.identifier.getText.toLowerCase(Locale.ROOT) match {
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala: ctx.identifier.getText.toLowerCase(Locale.ROOT) match {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! These slipped through the cracks. Must have fat-fingered my own grep to miss out on those.
I'll create a follow up.
What changes were proposed in this pull request?
We propose expanding the IDENTIFIER() clause, which turns a string into a qualified identifier, to all places identifiers can appear. The current clause is severely limited in where it can go because it accepts constant expressions, including session variables.
Due to the complexity of the argument the existing clause requires tricky code to incrementally analyze its arguments and then execute sections of parser code at a later point.
By contrast the generalized IDENTIFIER clause only allows string literals which can be processed in the visitor methods.
Due to the rework of parameter markers and string coalescing this allows for constructs such as:
it even allows:
This is really all identifier() needs. We may be able to deprecate and de-support the existing too complex identifier() implementation.
Why are the changes needed?
IDENTIFIER() is a popular feature, but it can only be used in very specific, hard to reason about places.
The new implementation preserved 99% of teh fucntion while expanding its use to everywhere.
Does this PR introduce any user-facing change?
Yes, it's a new feature
How was this patch tested?
expanded Parameters and inentifier-clause test suites.
Was this patch authored or co-authored using generative AI tooling?
Yes, Clause Sonnet 4.5